CS 719 - Data Science Project
Harpreet Singh
Forecast will be performed on mainly two weather patterns i.e. Average Temperature, Average Wind Speed.
Importing all the required liberaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
#import pmdarima as pm
from statsmodels.graphics.api import qqplot
from statsmodels.tsa.ar_model import AutoReg, ar_select_order, AutoRegResults
import matplotlib as m
import seaborn as sns
!pip install pmdarima
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pmdarima
Downloading pmdarima-2.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (1.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 26.0 MB/s eta 0:00:00
Requirement already satisfied: pandas>=0.19 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (1.4.4)
Requirement already satisfied: Cython!=0.29.18,!=0.29.31,>=0.29 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (0.29.34)
Requirement already satisfied: statsmodels>=0.13.2 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (0.13.5)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (1.10.1)
Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (1.22.4)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (1.2.2)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (1.1.1)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (1.26.15)
Requirement already satisfied: setuptools!=50.0.0,>=38.6.0 in /usr/local/lib/python3.9/dist-packages (from pmdarima) (67.6.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19->pmdarima) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas>=0.19->pmdarima) (2022.7.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from scikit-learn>=0.22->pmdarima) (3.1.0)
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.9/dist-packages (from statsmodels>=0.13.2->pmdarima) (23.0)
Requirement already satisfied: patsy>=0.5.2 in /usr/local/lib/python3.9/dist-packages (from statsmodels>=0.13.2->pmdarima) (0.5.3)
Requirement already satisfied: six in /usr/local/lib/python3.9/dist-packages (from patsy>=0.5.2->statsmodels>=0.13.2->pmdarima) (1.16.0)
Installing collected packages: pmdarima
Successfully installed pmdarima-2.0.3
Reading the dataset
As a part for First Temporal Approach, we are using temporal data of our target station, that is Torronto city.
df = pd.read_excel("/content/Torronto Daily.xlsx")
df.head()
| station | day | max_temp_f | min_temp_f | precip_in | avg_wind_speed_kts | avg_wind_drct | max_wind_speed_kts | max_wind_gust_kts | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | CYTZ | 2017-01-01 | 37.4 | 32 | None | 11.956522 | 245.61368 | 21 | 28 |
| 1 | CYTZ | 2017-01-02 | 41 | 32 | None | 11.34296 | 64.91508 | 17 | 21 |
| 2 | CYTZ | 2017-01-03 | 41 | 39.2 | None | 8.44124 | 57.301846 | 12 | None |
| 3 | CYTZ | 2017-01-04 | 39.2 | 19.4 | None | 20.355797 | 257.71188 | 33 | 42 |
| 4 | CYTZ | 2017-01-05 | 23 | 15.8 | None | 18.696377 | 254.1532 | 28 | 36 |
Checking for Null Values.
In dataset, null values are represented with string "None", thus creating custom function to check for null values
def checkforNulls(df):
for i in df.columns:
print("\n Column : {}".format(i))
count = 0
for j in df[i]:
if j == "None":
count = count+1
print("Null count : ",count)
checkforNulls(df)
Column : station Null count : 0 Column : day Null count : 0 Column : max_temp_f Null count : 2 Column : min_temp_f Null count : 2 Column : precip_in Null count : 2210 Column : avg_wind_speed_kts Null count : 1 Column : avg_wind_drct Null count : 1 Column : max_wind_speed_kts Null count : 1 Column : max_wind_gust_kts Null count : 408
These 2 (max_wind_gust_kts, precip_in) columns have large amount of null values, thus will drop these columns
df.drop(['max_wind_gust_kts','precip_in'],axis=1, inplace=True)
checkforNulls(df)
Column : station Null count : 0 Column : day Null count : 0 Column : max_temp_f Null count : 2 Column : min_temp_f Null count : 2 Column : avg_wind_speed_kts Null count : 1 Column : avg_wind_drct Null count : 1 Column : max_wind_speed_kts Null count : 1
Using the forward fill method for imputation of null values
df = df.replace('None', np.nan)
df.ffill(inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2212 entries, 0 to 2211 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 station 2212 non-null object 1 day 2212 non-null datetime64[ns] 2 max_temp_f 2212 non-null float64 3 min_temp_f 2212 non-null float64 4 avg_wind_speed_kts 2212 non-null float64 5 avg_wind_drct 2212 non-null float64 6 max_wind_speed_kts 2212 non-null float64 dtypes: datetime64[ns](1), float64(5), object(1) memory usage: 121.1+ KB
wTor = df.copy() #Creating dataset copy
wTor.set_index("day",inplace=True) #setting column day as index
wTor.drop("station", axis=1, inplace=True) #dropping the station name as its just one station
wTor.head()
| max_temp_f | min_temp_f | avg_wind_speed_kts | avg_wind_drct | max_wind_speed_kts | |
|---|---|---|---|---|---|
| day | |||||
| 2017-01-01 | 37.4 | 32.0 | 11.956522 | 245.613680 | 21.0 |
| 2017-01-02 | 41.0 | 32.0 | 11.342960 | 64.915080 | 17.0 |
| 2017-01-03 | 41.0 | 39.2 | 8.441240 | 57.301846 | 12.0 |
| 2017-01-04 | 39.2 | 19.4 | 20.355797 | 257.711880 | 33.0 |
| 2017-01-05 | 23.0 | 15.8 | 18.696377 | 254.153200 | 28.0 |
wTor.columns = ["MaxT","MinT","AvgWind","WindDir","MaxWind"] # Renaming the columns
Creating average daily temperature column
def avg_temp(day):
return (day.MaxT+day.MinT)/2
wTor['AvgT'] = wTor.apply(avg_temp,axis=1)
wTor.loc[wTor.AvgT <= 0]["AvgT"]
day 2018-01-05 -0.4 2018-01-06 -1.3 Name: AvgT, dtype: float64
We just have two rows where we have negative values, thus we will set those to 0 to avoid inconsistancies.
wTor.loc[wTor.AvgT <= 0]["AvgT"] = 0
<ipython-input-18-2be9159edd37>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy wTor.loc[wTor.AvgT <= 0]["AvgT"] = 0
Getting description of the dataset
wTor.describe()
| MaxT | MinT | AvgWind | WindDir | MaxWind | AvgT | |
|---|---|---|---|---|---|---|
| count | 2212.000000 | 2212.000000 | 2212.000000 | 2212.000000 | 2212.000000 | 2212.000000 |
| mean | 54.565913 | 43.326492 | 9.402608 | 192.024625 | 16.782098 | 48.946203 |
| std | 18.182012 | 17.143949 | 4.086955 | 98.971972 | 6.182911 | 17.484380 |
| min | 5.000000 | -7.600000 | 1.866667 | 2.021540 | 5.000000 | -1.300000 |
| 25% | 39.200000 | 32.000000 | 6.378805 | 77.883904 | 12.000000 | 35.600000 |
| 50% | 53.600000 | 42.800000 | 8.521739 | 222.487950 | 16.000000 | 47.300000 |
| 75% | 71.600000 | 59.000000 | 11.640399 | 273.772427 | 20.000000 | 65.300000 |
| max | 95.000000 | 77.000000 | 32.034100 | 359.808380 | 44.000000 | 85.100000 |
wTor.sort_values(by="AvgWind",ascending=False)
| MaxT | MinT | AvgWind | WindDir | MaxWind | AvgT | |
|---|---|---|---|---|---|---|
| day | ||||||
| 2018-04-15 | 33.8 | 26.6 | 32.034100 | 64.988990 | 41.0 | 30.2 |
| 2022-12-24 | 17.6 | 8.6 | 27.904348 | 244.861430 | 36.0 | 13.1 |
| 2019-02-12 | 35.6 | 23.0 | 26.939290 | 74.219345 | 37.0 | 29.3 |
| 2019-12-01 | 33.8 | 30.2 | 26.405160 | 65.503500 | 36.0 | 32.0 |
| 2022-03-23 | 39.2 | 35.6 | 25.925686 | 65.921890 | 35.0 | 37.4 |
| ... | ... | ... | ... | ... | ... | ... |
| 2019-06-28 | 77.0 | 64.4 | 2.646282 | 239.516660 | 15.0 | 70.7 |
| 2018-06-09 | 71.6 | 57.2 | 2.608696 | 222.531900 | 6.0 | 64.4 |
| 2021-06-08 | 75.2 | 62.6 | 2.480315 | 217.447720 | 6.0 | 68.9 |
| 2019-09-17 | 69.8 | 55.4 | 2.478261 | 144.767490 | 6.0 | 62.6 |
| 2020-11-08 | 64.4 | 44.6 | 1.866667 | 219.673780 | 6.0 | 54.5 |
2212 rows × 6 columns
As a part of Data analysis, we will create 2 more attributes to specify winter and summer months, so that we can analyze the data with more detail and impacts of weather patterns in different seasons.
Winter ~ Oct-Apr
Summer ~ May-Sept
wTor_season = wTor.copy(deep=True)
wTor_season['month'] = wTor_season.index.month
tCal = (wTor_season['month'] >= 5) & (wTor_season['month'] <= 9)
wTor_season['summer'] = np.where(tCal,1,0)
wTor_season['winter'] = np.where(wTor_season['summer'] != 1,1,0)
wTor_season
| MaxT | MinT | AvgWind | WindDir | MaxWind | AvgT | month | summer | winter | |
|---|---|---|---|---|---|---|---|---|---|
| day | |||||||||
| 2017-01-01 | 37.4 | 32.0 | 11.956522 | 245.613680 | 21.0 | 34.7 | 1 | 0 | 1 |
| 2017-01-02 | 41.0 | 32.0 | 11.342960 | 64.915080 | 17.0 | 36.5 | 1 | 0 | 1 |
| 2017-01-03 | 41.0 | 39.2 | 8.441240 | 57.301846 | 12.0 | 40.1 | 1 | 0 | 1 |
| 2017-01-04 | 39.2 | 19.4 | 20.355797 | 257.711880 | 33.0 | 29.3 | 1 | 0 | 1 |
| 2017-01-05 | 23.0 | 15.8 | 18.696377 | 254.153200 | 28.0 | 19.4 | 1 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-01-17 | 39.2 | 33.8 | 10.680435 | 56.813300 | 20.0 | 36.5 | 1 | 0 | 1 |
| 2023-01-18 | 42.8 | 37.4 | 10.594360 | 284.708860 | 17.0 | 40.1 | 1 | 0 | 1 |
| 2023-01-19 | 39.2 | 35.6 | 20.628057 | 64.660164 | 30.0 | 37.4 | 1 | 0 | 1 |
| 2023-01-20 | 39.2 | 32.0 | 9.893478 | 317.770900 | 19.0 | 35.6 | 1 | 0 | 1 |
| 2023-01-21 | 33.8 | 30.2 | 8.504348 | 293.190100 | 13.0 | 32.0 | 1 | 0 | 1 |
2212 rows × 9 columns
# timeseries plot - all data
wTor_season[["AvgWind", "MaxWind", "AvgT"]].plot(figsize=(8,6))
plt.show()
wTor_season.loc["2021-01-01":,["AvgWind", "MaxWind", "AvgT"]].plot(figsize=(8,6))
plt.figure(figsize=(8,6))
wTor_season.AvgWind.hist(bins=40, alpha=0.6, label='AvgWind')
wTor_season.MaxWind.hist(bins=40, alpha=0.6, label='MaxWind')
wTor_season.AvgT.hist(bins=40, alpha=0.8, label='month')
plt.legend()
plt.show()
Plotting heatmap to check for correlations
sns.heatmap(wTor_season.corr(), annot=True)
<Axes: >
Creating a plot to see correlation between Avg Wind and Avg Temp
sns.histplot(data=wTor_season, x="AvgWind",y="AvgT")
<Axes: xlabel='AvgWind', ylabel='AvgT'>
Creating a plot to see correlation between Avg Temp and Month
sns.lineplot(data=wTor_season[:], y="AvgT", x="month")
<Axes: xlabel='month', ylabel='AvgT'>
Creating a plot to see correlation between Avg Wind and Month
sns.lineplot(data=wTor_season[:], y="AvgWind", x="month")
<Axes: xlabel='month', ylabel='AvgWind'>
Creating a plot to see correlation between Wind Direction and Month
sns.lineplot(data=wTor_season[:], y="WindDir", x="month")
<Axes: xlabel='month', ylabel='WindDir'>
Creating a plot of Average wind with resepect to different seasons
plt.figure(figsize=(8,6))
wTor_season[wTor_season['summer'] == 1]['AvgWind'].hist(bins=60, alpha=0.8, label='summer')
wTor_season[wTor_season['winter'] == 1]['AvgWind'].hist(bins=60, alpha=0.8, label='winter')
plt.legend()
plt.show()
Plotting temperature attribute based on the seasons
plt.figure(figsize=(8,6))
wTor_season[wTor_season['summer'] == 1]['AvgT'].hist(bins=60, alpha=0.8, label='summer')
wTor_season[wTor_season['winter'] == 1]['AvgT'].hist(bins=60, alpha=0.8, label='winter')
plt.legend()
plt.show()
plt.figure(figsize=(8,6))
wTor_season[wTor_season['summer'] == 1]['WindDir'].hist(bins=60, alpha=0.8, label='summer')
wTor_season[wTor_season['winter'] == 1]['WindDir'].hist(bins=60, alpha=0.4, label='winter')
plt.legend()
plt.show()
Creating function for model evaluation
def model_performance(test, predictions):
mse = (((predictions) - (test)) ** 2).mean()
mape = np.mean(np.abs((test) - (predictions)) / (test)) * 100
normalizedrmse = np.sqrt(mse)/(test.max() - test.min())
print(#'MSE : {}'.format(round(mse, 2))+
'\nRMSE : {}'.format(round(np.sqrt(mse), 2))+
'\nNormalized RMSE : {}'.format(round(normalizedrmse,2)))
Creating a function for series decomposition
def decompose_series(s):
dcompseries = seasonal_decompose(s, period = 365,)
dcompseries.plot()
plt.show()
Decomposing Series of Average Wind into Trend, Seasonality, and Residuals
series_decomp = seasonal_decompose(wTor_season['AvgWind'], model='additive', period=int(365), extrapolate_trend='freq')
#Assigning decomposed variables
trend = series_decomp.trend
seasonal = series_decomp.seasonal
residual = series_decomp.resid
series_decomp.plot()
plt.show()
Decomposing Series of Average Temp into Trend, Seasonality, and Residuals
decompose_series(wTor_season["AvgT"])
Checking for DickeyFuller Test to check the stationarity in the Average Wind
adfVal = adfuller(wTor_season['AvgWind'], autolag = 'AIC')
print("1. ADF : ",adfVal[0])
print("2. P-Value : ", adfVal[1])
print("3. Num Of Lags : ", adfVal[2])
print("4. Num Of Observations", adfVal[3])
print("5. Critical Values :")
for key, val in adfVal[4].items():
print("\t",key, ": ", val)
1. ADF : -3.8860715870330726 2. P-Value : 0.0021397776318647873 3. Num Of Lags : 25 4. Num Of Observations 2186 5. Critical Values : 1% : -3.433344965914077 5% : -2.8628630765096195 10% : -2.567474338973205
Value of p is less than 0.05 and ADF stat value is less than 5% CI rate, Thus series is stationary
Checking for DickeyFuller Test to check the stationarity in the Average Temp
adfVal = adfuller(wTor_season['AvgT'], autolag = 'AIC')
print("1. ADF : ",adfVal[0])
print("2. P-Value : ", adfVal[1])
print("3. Num Of Lags : ", adfVal[2])
print("4. Num Of Observations", adfVal[3])
print("5. Critical Values :")
for key, val in adfVal[4].items():
print("\t",key, ": ", val)
1. ADF : -2.668059468518559 2. P-Value : 0.07973032377199007 3. Num Of Lags : 12 4. Num Of Observations 2199 5. Critical Values : 1% : -3.433327239607306 5% : -2.8628552495229194 10% : -2.5674701716229276
adfVal = adfuller(wTor_season['MinT'], autolag = 'AIC')
print("1. ADF : ",adfVal[0])
print("2. P-Value : ", adfVal[1])
print("3. Num Of Lags : ", adfVal[2])
print("4. Num Of Observations", adfVal[3])
print("5. Critical Values :")
for key, val in adfVal[4].items():
print("\t",key, ": ", val)
1. ADF : -2.537576611124574 2. P-Value : 0.10659711764514024 3. Num Of Lags : 20 4. Num Of Observations 2191 5. Critical Values : 1% : -3.433338123180619 5% : -2.862860055130884 10% : -2.567472730288902
Checking for Auto Correlation Graph for Average Temp
plot_acf(wTor_season['AvgT'], lags=100)
plt.show()
Checking for Auto Correlation Graph for Average Wind
plot_acf(wTor_season['AvgWind'], lags=100)
plt.show()
plot_acf(wTor_season['MinT'], lags=100)
plt.show()
Checking for Partial Auto Correlation Graph for Average Temp
plot_pacf(wTor_season['AvgT'], lags=20)
plt.show()
/usr/local/lib/python3.9/dist-packages/statsmodels/graphics/tsaplots.py:348: FutureWarning: The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
warnings.warn(
Checking for Auto Correlation Graph for Average Wind
plot_pacf(wTor_season['AvgWind'], lags=20)
plt.show()
plot_pacf(wTor_season['MinT'], lags=20)
plt.show()
Performing train test spilit on the data
train = wTor_season[:int(len(df)*0.90)]
test = wTor_season[int(len(df)*0.90):]
import pmdarima as pm
SARIMAX Model
Finding the best parameter values of p,d,q for SARIMAX model
import itertools
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
Using Akaike Information Criteria to Find optimal parameters for Average Wind Series
import warnings
warnings.filterwarnings("ignore") # specify to ignore warning messages
aic = []
prm = []
prms = []
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(train.AvgWind,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
aic.append(results.aic)
prm.append(param)
prms.append(param_seasonal)
print('SARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
except:
continue
SARIMA(0, 0, 0)x(0, 0, 0, 12)12 - AIC:14892.16584187634 SARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:13542.590787431443 SARIMA(0, 0, 0)x(0, 1, 0, 12)12 - AIC:12051.314901490687 SARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:11102.320025772577 SARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:11925.260047295513 SARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:11176.507471025467 SARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:11486.46734883024 SARIMA(0, 0, 0)x(1, 1, 1, 12)12 - AIC:11076.816338483342 SARIMA(0, 0, 1)x(0, 0, 0, 12)12 - AIC:13410.937870494907 SARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:12655.45001689946 SARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:11954.154936916402 SARIMA(0, 0, 1)x(0, 1, 1, 12)12 - AIC:10907.785931965598 SARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:11776.627387291759 SARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:10984.485860987166 SARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:11381.002996399564 SARIMA(0, 0, 1)x(1, 1, 1, 12)12 - AIC:10873.246143483755 SARIMA(0, 1, 0)x(0, 0, 0, 12)12 - AIC:11686.342314484813 SARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:11613.744619359757 SARIMA(0, 1, 0)x(0, 1, 0, 12)12 - AIC:12977.084214408189 SARIMA(0, 1, 0)x(0, 1, 1, 12)12 - AIC:11606.134463080334 SARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:11619.732052622507 SARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:11614.63537173556 SARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:12354.886325415762 SARIMA(0, 1, 0)x(1, 1, 1, 12)12 - AIC:11607.460750738348 SARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:10892.629119753943 SARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:10822.824313980344 SARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:12049.208536359618 SARIMA(0, 1, 1)x(0, 1, 1, 12)12 - AIC:10813.376467085907 SARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:10832.47383217002 SARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:10824.751853823778 SARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:11491.170988882444 SARIMA(0, 1, 1)x(1, 1, 1, 12)12 - AIC:10815.258765927785 SARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:11590.660751826714 SARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:11514.465583063307 SARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:11973.537964937117 SARIMA(1, 0, 0)x(0, 1, 1, 12)12 - AIC:10877.063555859248 SARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:11513.435207354642 SARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:10954.01436029424 SARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:11385.570479287857 SARIMA(1, 0, 0)x(1, 1, 1, 12)12 - AIC:10859.759484616163 SARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:10899.163562088184 SARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:10829.314223898871 SARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:11952.513874508051 SARIMA(1, 0, 1)x(0, 1, 1, 12)12 - AIC:10817.225354809094 SARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:10833.873122724206 SARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:10893.689243792633 SARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:11376.533530239729 SARIMA(1, 0, 1)x(1, 1, 1, 12)12 - AIC:10817.57285226027 SARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:11444.312772941403 SARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:11372.84539477476 SARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:12727.713293496185 SARIMA(1, 1, 0)x(0, 1, 1, 12)12 - AIC:11368.052393196544 SARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:11372.770929644397 SARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:11372.942988561636 SARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:12103.575876720908 SARIMA(1, 1, 0)x(1, 1, 1, 12)12 - AIC:11368.402619591985 SARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:10804.545513297842 SARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:10736.26541412025 SARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:11971.26994505196 SARIMA(1, 1, 1)x(0, 1, 1, 12)12 - AIC:10727.826250165614 SARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:10740.377119603902 SARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:10737.977090510652 SARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:11389.223661355853 SARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:10729.705461948259
Using Akaike Information Criteria to Find optimal parameters for Average Temp Series
import warnings
warnings.filterwarnings("ignore") # specify to ignore warning messages
aic = []
prm = []
prms = []
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(train.AvgT,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
aic.append(results.aic)
prm.append(param)
prms.append(param_seasonal)
print('SARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
except:
continue
SARIMA(0, 0, 0)x(0, 0, 0, 12)12 - AIC:21316.947018706982 SARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:19051.10409669876 SARIMA(0, 0, 0)x(0, 1, 0, 12)12 - AIC:14565.421434460677 SARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:14317.198171439713 SARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:14563.064005314944 SARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:14418.34132253849 SARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:14300.756771395483 SARIMA(0, 0, 0)x(1, 1, 1, 12)12 - AIC:14295.615032322097 SARIMA(0, 0, 1)x(0, 0, 0, 12)12 - AIC:18778.685842791336 SARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:16871.515820059674 SARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:13558.104387312365 SARIMA(0, 0, 1)x(0, 1, 1, 12)12 - AIC:13168.816764460995 SARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:13553.059180862145 SARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:13256.240721989376 SARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:13169.98869518359 SARIMA(0, 0, 1)x(1, 1, 1, 12)12 - AIC:13142.640933165556 SARIMA(0, 1, 0)x(0, 0, 0, 12)12 - AIC:12324.28212389988 SARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:12237.147205338508 SARIMA(0, 1, 0)x(0, 1, 0, 12)12 - AIC:13682.94040608206 SARIMA(0, 1, 0)x(0, 1, 1, 12)12 - AIC:12231.994173598261 SARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:12242.804850775003 SARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:12238.640166804578 SARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:12978.415443048409 SARIMA(0, 1, 0)x(1, 1, 1, 12)12 - AIC:12229.337920092637 SARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:12308.595813838341 SARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:12217.622024021666 SARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:13662.883720458723 SARIMA(0, 1, 1)x(0, 1, 1, 12)12 - AIC:12213.372063809835 SARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:12229.565419425502 SARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:12218.749193504169 SARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:12964.026912407317 SARIMA(0, 1, 1)x(1, 1, 1, 12)12 - AIC:12209.039387352826 SARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:12326.643354797216 SARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:12240.588410554143 SARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:13345.41138494174 SARIMA(1, 0, 0)x(0, 1, 1, 12)12 - AIC:12193.081338692717 SARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:12240.360275974348 SARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:12241.995355874704 SARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:12725.688053539667 SARIMA(1, 0, 0)x(1, 1, 1, 12)12 - AIC:12192.802454463668 SARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:12312.545145605043 SARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:12223.08555049805 SARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:13297.023829966127 SARIMA(1, 0, 1)x(0, 1, 1, 12)12 - AIC:12185.426333167854 SARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:12228.429296907501 SARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:12224.452659849932 SARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:12710.905827799199 SARIMA(1, 0, 1)x(1, 1, 1, 12)12 - AIC:12183.511810005626 SARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:12321.35849152382 SARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:12232.999310214636 SARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:13678.410478635644 SARIMA(1, 1, 0)x(0, 1, 1, 12)12 - AIC:12228.823497138708 SARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:12232.762969960233 SARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:12234.445446263495 SARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:12967.667328830681 SARIMA(1, 1, 0)x(1, 1, 1, 12)12 - AIC:12225.635271460575 SARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:12106.856263431593 SARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:12019.418596202997 SARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:13341.746464697786 SARIMA(1, 1, 1)x(0, 1, 1, 12)12 - AIC:12013.467681464645 SARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:12027.4534571619 SARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:12020.142985547593 SARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:12689.837061460286 SARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:12010.770707466076
a = pd.DataFrame(columns=["Parms","sParms", "AIC"])
a["Parms"] = prm
a["sParms"] = prms
a["AIC"] = aic
#print(a.AIC.sort_values().head(60))
ys = a.AIC
xs = a.index
plt.scatter(xs,ys)
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.plot(figsize=(16,30))
'''
residuals = train.AvgT.copy(deep=True)
residuals.index = pd.DatetimeIndex(residuals.index).to_period('D')
mod = ar_select_order(residuals, maxlag=40, ic='aic', old_names=True)
aic = []
for key, val in mod.aic.items():
if key != 0:
aic.append((key[-1], val))
aic.sort()
x,y = [x for x,y in aic],[y for x,y in aic]
plt.scatter(x, y)
plt.plot([0,40],[y[15],y[15]], 'tab:orange')
plt.text(3,y[15]+0.002, '{0}'.format(round(y[15],3)),color='tab:orange')
plt.plot([0,40],[y[20],y[20]], 'k--')
plt.text(3,y[20]-0.004, '{0}'.format(round(y[20],3)))
plt.title("AIC Criterion")
plt.xlabel("Lags in AR Model")
plt.ylabel("AIC")
plt.show()
'''
'\nresiduals = train.AvgT.copy(deep=True)\nresiduals.index = pd.DatetimeIndex(residuals.index).to_period(\'D\')\nmod = ar_select_order(residuals, maxlag=40, ic=\'aic\', old_names=True)\naic = []\nfor key, val in mod.aic.items():\n if key != 0:\n aic.append((key[-1], val))\naic.sort() \nx,y = [x for x,y in aic],[y for x,y in aic]\nplt.scatter(x, y)\nplt.plot([0,40],[y[15],y[15]], \'tab:orange\')\nplt.text(3,y[15]+0.002, \'{0}\'.format(round(y[15],3)),color=\'tab:orange\')\nplt.plot([0,40],[y[20],y[20]], \'k--\')\nplt.text(3,y[20]-0.004, \'{0}\'.format(round(y[20],3)))\nplt.title("AIC Criterion")\nplt.xlabel("Lags in AR Model")\nplt.ylabel("AIC")\nplt.show()\n'
Univerate SARIMAX Model fitting with Walk forward method
Walk Forward Method : In this method, we first forecast on 7 days and then after making predictions, we add that 7 days data to training data and make forecast of next 7 days. That way, in total we forecast for 28 days.
def univerate_SarimaX(s, n):
predictions = np.array([])
train, test = s[:n], s[n:]
day_list = [7,14,21,28] # weeks 1,2,3,4
for i in day_list:
model = sm.tsa.statespace.SARIMAX(train, order=(0, 1, 1),
seasonal_order=(1, 1, 1, 30)).fit(max_iter = 50,
method = 'powell')
# Forecasting for next 7 days
forecast = model.get_forecast(steps = 7, dynamic=False)
predictions = np.concatenate((predictions, forecast),
axis=None)
j = i-7
model.plot_diagnostics(figsize=(15, 12))
train = np.concatenate((train,test[j:i]), axis=None) # adding it into training data
return predictions
def get_vals(preds):
pp = list(preds[0].predicted_mean)
pp.append(list(preds[1].predicted_mean))
pp.append(list(preds[2].predicted_mean))
pp.append(list(preds[3].predicted_mean))
return pp
Forecasting Univerate Average Temporature with Walk forward Method
preds = univerate_SarimaX(wTor_season.AvgT,1990)
pp = get_vals(preds)
Optimization terminated successfully.
Current function value: 3.080517
Iterations: 4
Function evaluations: 178
Optimization terminated successfully.
Current function value: 3.080841
Iterations: 4
Function evaluations: 180
Optimization terminated successfully.
Current function value: 3.080801
Iterations: 4
Function evaluations: 182
Optimization terminated successfully.
Current function value: 3.080106
Iterations: 4
Function evaluations: 183
dates = pd.date_range(wTor_season.iloc[1990].name, periods=28, freq='1D')
predictions = pd.DataFrame(data=pd.Series(pp).explode().reset_index())
predictions.drop("index",axis=1, inplace=True)
predictions.set_index(dates,inplace=True)
Plotting the Average Temperature Predictions and Actual Data
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgT[1800:2018], label="Actual")
plt.plot(predictions, label="Predicted")
plt.title('Average Temprature forecast', fontsize=20)
plt.ylabel('Average Temp', fontsize=16)
plt.legend()
<matplotlib.legend.Legend at 0x7ff4cebd4820>
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgT[1990:2018], label="Actual")
plt.plot(predictions, label="Predicted")
plt.title('Average Temprature forecast (with walk forward)', fontsize=20)
plt.ylabel('Average Temp', fontsize=16)
plt.legend()
<matplotlib.legend.Legend at 0x7ff4ceaf8eb0>
Checking for the model performance
model_performance(wTor_season.AvgT[1990:2018].values,predictions.values)
RMSE : 6.33 Normalized RMSE : 0.39
Forecasting Univerate Average Wind with Walk forward Method
preds = univerate_SarimaX(wTor_season.AvgWind,1990)
pp = get_vals(preds)
Optimization terminated successfully.
Current function value: 2.728061
Iterations: 4
Function evaluations: 185
Optimization terminated successfully.
Current function value: 2.729830
Iterations: 4
Function evaluations: 184
Optimization terminated successfully.
Current function value: 2.729272
Iterations: 4
Function evaluations: 182
Optimization terminated successfully.
Current function value: 2.728387
Iterations: 4
Function evaluations: 179
predictions = pd.DataFrame(data=pd.Series(pp).explode().reset_index())
predictions.drop("index",axis=1, inplace=True)
predictions.set_index(dates,inplace=True)
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgWind[1990:2018], label="Actual")
plt.plot(predictions, label="Predicted")
plt.title('Average Wind forecast (with Walk forward)', fontsize=20)
plt.ylabel('Average Wind', fontsize=16)
plt.legend()
<matplotlib.legend.Legend at 0x7ff4d2e79df0>
model_performance(wTor_season.AvgWind[1990:2018].values,predictions.values)
RMSE : 3.62 Normalized RMSE : 0.25
Forecasting Univerate Average Temperature without Walk forward Method
model = sm.tsa.statespace.SARIMAX(train.AvgT, order=(0,1,1),
seasonal_order=(1,1,1,30), simple_differencing=False,
enforce_stationarity=False,
enforce_invertibility=False,
).fit(max_iter = 50,
method = 'powell')
#method='innovations_mle', low_memory=True)
# Forecast daily loads for week i
forecast1 = model.get_forecast(steps = 30, dynamic=False)
Optimization terminated successfully.
Current function value: 3.017680
Iterations: 3
Function evaluations: 138
model.plot_diagnostics(figsize=(15, 12))
forecast1.predicted_mean.plot()
<Axes: title={'center': 'Correlogram'}>
Univeriate Forecast evaluation on Temperature without Walk Forward
print(model_performance(test.AvgT[:30],forecast1.predicted_mean))
RMSE : 8.07 Normalized RMSE : 0.5 None
model = sm.tsa.statespace.SARIMAX(train.AvgWind , order=(0,1,1),
seasonal_order=(1,1,1,30), simple_differencing=False,
enforce_stationarity=False,
enforce_invertibility=False,
).fit(max_iter = 50,
method = 'powell')
#method='innovations_mle', low_memory=True)
# Forecast daily loads for week i
forecast1 = model.get_forecast(steps = 30, dynamic=False)
Optimization terminated successfully.
Current function value: 2.677364
Iterations: 5
Function evaluations: 271
model.plot_diagnostics(figsize=(15, 12))
forecast1.predicted_mean.plot()
<Axes: title={'center': 'Correlogram'}>
Univeriate Forecast evaluation on Wind without Walk Forward
print(model_performance(test.AvgWind[:30],forecast1.predicted_mean))
RMSE : 3.55 Normalized RMSE : 0.25 None
Multiveriate SarmiaX Forecasting temperature with Average wind as Exogenious feature (with Walk Forward)
def Multivariate_temp_wf(s, n,ex):
predictions = np.array([])
mape_list = []
train, test = s[:n], s[n:]
extrain, extest = ex[:n], ex[n:]
day_list = [7,14,21,28] # weeks 1,2,3,4
for i in day_list:
model = sm.tsa.statespace.SARIMAX(train,exog=extrain, order=(1, 1, 1),
seasonal_order=(1, 1, 1,30)).fit(max_iter = 50,
method = 'powell')
forecast = model.get_forecast(steps = 7, dynamic=False, exog=extest[:7])
predictions = np.concatenate((predictions, forecast),
axis=None)
j = i-7
model.plot_diagnostics(figsize=(15, 12))
train = np.concatenate((train,test[j:i]), axis=None)
extrain = np.concatenate((extrain,extest[j:i]), axis=None)
return predictions#, mape_list
preds = Multivariate_temp_wf(wTor_season.AvgT,1990,wTor_season.AvgWind)
pp = list(preds[0].predicted_mean)
pp.append(list(preds[1].predicted_mean))
pp.append(list(preds[2].predicted_mean))
pp.append(list(preds[3].predicted_mean))
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgT[1990:2018].values, label="Actual")
plt.plot(pd.Series(pp).explode().reset_index()[0].values, label="Predicted")
plt.title('Average Temperature Forecast (with Walk Forward)', fontsize=20)
plt.ylabel('Average Temp', fontsize=16)
plt.legend()
Optimization terminated successfully.
Current function value: 3.029942
Iterations: 5
Function evaluations: 315
Optimization terminated successfully.
Current function value: 3.030211
Iterations: 5
Function evaluations: 315
Optimization terminated successfully.
Current function value: 3.030145
Iterations: 5
Function evaluations: 316
Optimization terminated successfully.
Current function value: 3.029371
Iterations: 5
Function evaluations: 317
<matplotlib.legend.Legend at 0x7ff4cebf3d90>
model_performance(wTor_season.AvgT[1990:2018].values,pd.Series(pp).explode().reset_index()[0].values)
RMSE : 5.95 Normalized RMSE : 0.37
Multiveriate SarmiaX Forecasting temperature with Month as Exogenious feature (with Walk Forward)
preds = Multivariate_temp_wf(wTor_season.AvgT,1990,wTor_season.month)
pp = list(preds[0].predicted_mean)
pp.append(list(preds[1].predicted_mean))
pp.append(list(preds[2].predicted_mean))
pp.append(list(preds[3].predicted_mean))
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgT[1990:2018].values, label="Actual")
plt.plot(pd.Series(pp).explode().reset_index()[0].values, label="Predicted")
plt.title('Average Temperature Forecast (with walk forward)', fontsize=20)
plt.ylabel('Average Temp', fontsize=16)
plt.legend()
Optimization terminated successfully.
Current function value: 3.029547
Iterations: 5
Function evaluations: 320
Optimization terminated successfully.
Current function value: 3.029812
Iterations: 5
Function evaluations: 321
Optimization terminated successfully.
Current function value: 3.029747
Iterations: 5
Function evaluations: 321
Optimization terminated successfully.
Current function value: 3.028987
Iterations: 5
Function evaluations: 319
<matplotlib.legend.Legend at 0x7ff4d2dc3eb0>
model_performance(wTor_season.AvgT[1990:2018].values,pd.Series(pp).explode().reset_index()[0].values)
RMSE : 5.99 Normalized RMSE : 0.37
Multiveriate SarmiaX Forecasting Wind with Month as Exogenious feature (with Walk Forward)
def Multivariate_wind_wf(s, n,ex):
predictions = np.array([])
mape_list = []
train, test = s[:n], s[n:]
extrain, extest = ex[:n], ex[n:]
day_list = [7,14,21,28] # weeks 1,2,3,4
for i in day_list:
model = sm.tsa.statespace.SARIMAX(train,exog=extrain, order=(1, 1, 1),
seasonal_order=(1, 1, 1,7)).fit(max_iter = 50,
method = 'powell')
forecast = model.get_forecast(steps = 7, dynamic=False, exog=extest[:7])
predictions = np.concatenate((predictions, forecast),
axis=None)
j = i-7
model.plot_diagnostics(figsize=(15, 12))
train = np.concatenate((train,test[j:i]), axis=None)
extrain = np.concatenate((extrain,extest[j:i]), axis=None)
return predictions
preds = Multivariate_wind_wf(wTor_season.AvgWind,1990,wTor_season.month)
pp = list(preds[0].predicted_mean)
pp.append(list(preds[1].predicted_mean))
pp.append(list(preds[2].predicted_mean))
pp.append(list(preds[3].predicted_mean))
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgWind[1990:2018].values, label="Actual")
plt.plot(pd.Series(pp).explode().reset_index()[0].values, label="Predicted")
plt.title('Average Wind Forecast (with walk forward)', fontsize=20)
plt.ylabel('Sales', fontsize=16)
plt.legend()
Optimization terminated successfully.
Current function value: 2.748505
Iterations: 4
Function evaluations: 308
Optimization terminated successfully.
Current function value: 2.749108
Iterations: 4
Function evaluations: 305
Optimization terminated successfully.
Current function value: 2.748692
Iterations: 4
Function evaluations: 300
Optimization terminated successfully.
Current function value: 2.748735
Iterations: 4
Function evaluations: 314
<matplotlib.legend.Legend at 0x7ff4d2a9c070>
model_performance(wTor_season.AvgWind[1990:2018].values,pd.Series(pp).explode().reset_index()[0].values)
RMSE : 4.01 Normalized RMSE : 0.28
Multiveriate SarmiaX Forecasting Wind with Average Temperature as Exogenious feature (with Walk Forward)
preds = Multivariate_wind_wf(wTor_season.AvgWind,1990,wTor_season.AvgT)
pp = list(preds[0].predicted_mean)
pp.append(list(preds[1].predicted_mean))
pp.append(list(preds[2].predicted_mean))
pp.append(list(preds[3].predicted_mean))
plt.figure(figsize=(16,4))
plt.plot(wTor_season.AvgWind[1990:2018].values, label="Actual")
plt.plot(pd.Series(pp).explode().reset_index()[0].values, label="Predicted")
plt.title('Average Wind Forecast', fontsize=20)
plt.ylabel('Average Wind', fontsize=16)
plt.legend()
Optimization terminated successfully.
Current function value: 2.701267
Iterations: 5
Function evaluations: 357
Optimization terminated successfully.
Current function value: 2.702857
Iterations: 5
Function evaluations: 354
Optimization terminated successfully.
Current function value: 2.701726
Iterations: 5
Function evaluations: 357
Optimization terminated successfully.
Current function value: 2.701311
Iterations: 5
Function evaluations: 357
<matplotlib.legend.Legend at 0x7ff4ce207ca0>
model_performance(wTor_season.AvgWind[1990:2018].values,pd.Series(pp).explode().reset_index()[0].values)
RMSE : 3.56 Normalized RMSE : 0.25
Multiveriate SarmiaX Forecasting Average Temperature with Month as Exogenious feature (without Walk Forward)
model = sm.tsa.statespace.SARIMAX(train.AvgT,exog=train.month, order=(0,1,0),
seasonal_order=(0,1,0,90), simple_differencing=False,
enforce_stationarity=False,
enforce_invertibility=False,
).fit(max_iter = 50,
method = 'powell')
#method='innovations_mle', low_memory=True)
# Forecast daily loads for week i
forecast1 = model.get_forecast(steps = 30, dynamic=False, exog=test.month[:30])
Optimization terminated successfully.
Current function value: 3.265688
Iterations: 3
Function evaluations: 101
model.plot_diagnostics(figsize=(15, 12))
forecast1.predicted_mean.plot()
<Axes: title={'center': 'Correlogram'}>
test.AvgT[:30].plot()
forecast1.predicted_mean.plot()
plt.show()
print(model_performance(test.AvgT[:30],forecast1.predicted_mean))
RMSE : 8.11 Normalized RMSE : 0.5 None
Multiveriate SarmiaX Forecasting Avg Temperature with Average Wind as Exogenious feature (without Walk Forward)
model = sm.tsa.statespace.SARIMAX(train.AvgT,exog=train.AvgWind, order=(0,1,0),
seasonal_order=(0,1,0,90), simple_differencing=False,
enforce_stationarity=False,
enforce_invertibility=False,
).fit(max_iter = 50,
method = 'powell')
#method='innovations_mle', low_memory=True)
# Forecast daily loads for week i
forecast1 = model.get_forecast(steps = 30, dynamic=False, exog=test.AvgWind[:30])
Optimization terminated successfully.
Current function value: 3.264116
Iterations: 2
Function evaluations: 75
model.plot_diagnostics(figsize=(15, 12))
forecast1.predicted_mean.plot()
<Axes: title={'center': 'Correlogram'}>
test.AvgT[:30].plot()
forecast1.predicted_mean.plot()
plt.show()
print(model_performance(test.AvgT[:30],forecast1.predicted_mean))
RMSE : 8.16 Normalized RMSE : 0.5 None
Multiveriate SarmiaX Forecasting Wind with Month as Exogenious feature (without Walk Forward)
model = sm.tsa.statespace.SARIMAX(train.AvgWind,exog=train.month, order=(1,1,1),
seasonal_order=(1,1,1,30), #simple_differencing=False,
enforce_stationarity=False,
enforce_invertibility=False,
).fit(max_iter = 50,
method = 'powell')
#method='innovations_mle', low_memory=True)
# Forecast daily loads for week i
forecast1 = model.get_forecast(steps = 30, dynamic=False, exog=test.month[:30])
Optimization terminated successfully.
Current function value: 2.656734
Iterations: 4
Function evaluations: 274
model.plot_diagnostics(figsize=(15, 12))
forecast1.predicted_mean.plot()
<Axes: title={'center': 'Correlogram'}>
test.AvgWind[:30].plot()
forecast1.predicted_mean.plot()
plt.show()
print(model_performance(test.AvgWind[:30],forecast1.predicted_mean))
RMSE : 3.59 Normalized RMSE : 0.25 None
Multiveriate SarmiaX Forecasting Wind with Average Temp as Exogenious feature (without Walk Forward)
model = sm.tsa.statespace.SARIMAX(train.AvgWind,exog= train.AvgT, order=(1,1,1),
seasonal_order=(1,1,1,7), simple_differencing=False,
enforce_stationarity=False,
enforce_invertibility=False,
).fit(max_iter = 50,
method = 'powell')
#method='innovations_mle', low_memory=True)
# Forecast daily loads for week i
forecast1 = model.get_forecast(steps = 30, dynamic=False, exog=test.AvgT[:30])
Optimization terminated successfully.
Current function value: 2.687521
Iterations: 5
Function evaluations: 330
model.plot_diagnostics(figsize=(15, 12))
forecast1.predicted_mean.plot()
<Axes: title={'center': 'Correlogram'}>
test.AvgWind[:30].plot()
forecast1.predicted_mean.plot()
plt.show()
print(model_performance(test.AvgWind[:30],forecast1.predicted_mean))
RMSE : 3.55 Normalized RMSE : 0.25 None
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
sns.heatmap(train.corr(),annot=True)
<Axes: >
Checking correlation in variables for VAR model
from statsmodels.tsa.stattools import grangercausalitytests
maxlag = 10
#test_name = 'ssr_chi2test'
def grangers_causation_matrix(data, variables, test='ssr_chi2test', verbose=False):
"""Check Granger Causality of all possible combinations of the Time series.
The rows are the response variable, columns are predictors. The values in the table
are the P-Values. P-Values lesser than the significance level (0.05), implies
the Null Hypothesis that the coefficients of the corresponding past values is
zero, that is, the X does not cause Y can be rejected.
data : pandas dataframe containing the time series variables
variables : list containing names of the time series variables.
"""
df = pd.DataFrame(np.zeros((len(variables), len(variables))), columns=variables, index=variables)
for c in df.columns:
for r in df.index:
test_result = grangercausalitytests(data[[r, c]], maxlag=maxlag, verbose=False)
p_values = [round(test_result[i+1][0][test][1],4) for i in range(maxlag)]
if verbose: print(f'Y = {r}, X = {c}, P Values = {p_values}')
min_p_value = np.min(p_values)
df.loc[r, c] = min_p_value
df.columns = [var + '_x' for var in variables]
df.index = [var + '_y' for var in variables]
return df
#Ref : https://www.machinelearningplus.com/time-series/granger-causality-test-in-python/
grangers_causation_matrix(wTor_season, variables = wTor_season.columns)
| MaxT_x | MinT_x | AvgWind_x | WindDir_x | MaxWind_x | AvgT_x | month_x | summer_x | winter_x | |
|---|---|---|---|---|---|---|---|---|---|
| MaxT_y | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0123 | 0.0000 | 0.0000 |
| MinT_y | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0124 | 0.0000 | 0.0000 |
| AvgWind_y | 0.0000 | 0.0000 | 1.0000 | 0.0018 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 0.0000 |
| WindDir_y | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.2856 | 0.0003 | 0.0003 |
| MaxWind_y | 0.0000 | 0.0000 | 0.0026 | 0.0000 | 1.0000 | 0.0000 | 0.0014 | 0.0000 | 0.0000 |
| AvgT_y | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.0161 | 0.0000 | 0.0000 |
| month_y | 0.0000 | 0.0000 | 0.0002 | 0.0714 | 0.0028 | 0.0000 | 1.0000 | 0.0067 | 0.0067 |
| summer_y | 0.0005 | 0.0017 | 0.5924 | 0.0372 | 0.5651 | 0.0006 | 0.0380 | 1.0000 | 1.0000 |
| winter_y | 0.0005 | 0.0017 | 0.5924 | 0.0372 | 0.5651 | 0.0006 | 0.0380 | 1.0000 | 1.0000 |
sns.heatmap(grangers_causation_matrix(wTor_season, variables = wTor_season.columns),)
<Axes: >
adfVal = adfuller(wTor_season['AvgWind'], autolag = 'AIC')
print("1. ADF : ",adfVal[0])
print("2. P-Value : ", adfVal[1])
print("3. Num Of Lags : ", adfVal[2])
print("4. Num Of Observations", adfVal[3])
print("5. Critical Values :")
for key, val in adfVal[4].items():
print("\t",key, ": ", val)
1. ADF : -3.8860715870330726 2. P-Value : 0.0021397776318647873 3. Num Of Lags : 25 4. Num Of Observations 2186 5. Critical Values : 1% : -3.433344965914077 5% : -2.8628630765096195 10% : -2.567474338973205
adfVal = adfuller(wTor_season['WindDir'], autolag = 'AIC')
print("1. ADF : ",adfVal[0])
print("2. P-Value : ", adfVal[1])
print("3. Num Of Lags : ", adfVal[2])
print("4. Num Of Observations", adfVal[3])
print("5. Critical Values :")
for key, val in adfVal[4].items():
print("\t",key, ": ", val)
1. ADF : -7.196902850352088 2. P-Value : 2.4213653692625174e-10 3. Num Of Lags : 25 4. Num Of Observations 2186 5. Critical Values : 1% : -3.433344965914077 5% : -2.8628630765096195 10% : -2.567474338973205
adfVal = adfuller(wTor_season['month'], autolag = 'AIC')
print("1. ADF : ",adfVal[0])
print("2. P-Value : ", adfVal[1])
print("3. Num Of Lags : ", adfVal[2])
print("4. Num Of Observations", adfVal[3])
print("5. Critical Values :")
for key, val in adfVal[4].items():
print("\t",key, ": ", val)
1. ADF : -4.065605517183782 2. P-Value : 0.001103376080207832 3. Num Of Lags : 0 4. Num Of Observations 2211 5. Critical Values : 1% : -3.433311062093479 5% : -2.8628481063596647 10% : -2.5674663683699017
forecasting_model = VAR(train[["AvgT","AvgWind","month","summer"]])
results_aic = []
for p in range(1,20):
results = forecasting_model.fit(p)
results_aic.append(results.aic)
import seaborn as sns
sns.set()
plt.plot(list(np.arange(1,20,1)), results_aic)
plt.xlabel("Order")
plt.ylabel("AIC")
plt.show()
results = forecasting_model.fit(3)
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Mon, 10, Apr, 2023
Time: 05:45:39
--------------------------------------------------------------------
No. of Equations: 4.00000 BIC: -0.400311
Nobs: 1987.00 HQIC: -0.492942
Log likelihood: -10682.6 FPE: 0.578848
AIC: -0.546717 Det(Omega_mle): 0.563944
--------------------------------------------------------------------
Results for equation AvgT
=============================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------
const 7.608364 0.774336 9.826 0.000
L1.AvgT 0.845558 0.022004 38.428 0.000
L1.AvgWind -0.292889 0.030697 -9.541 0.000
L1.month 0.276199 0.192248 1.437 0.151
L1.summer 3.023784 1.493871 2.024 0.043
L2.AvgT -0.216099 0.028714 -7.526 0.000
L2.AvgWind 0.015729 0.032157 0.489 0.625
L2.month -0.332979 0.269404 -1.236 0.216
L2.summer -0.648525 2.097457 -0.309 0.757
L3.AvgT 0.231758 0.021500 10.779 0.000
L3.AvgWind 0.004849 0.031265 0.155 0.877
L3.month 0.115392 0.191315 0.603 0.546
L3.summer 0.811987 1.502818 0.540 0.589
=============================================================================
Results for equation AvgWind
=============================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------
const 9.660679 0.569415 16.966 0.000
L1.AvgT -0.038082 0.016181 -2.354 0.019
L1.AvgWind 0.264740 0.022573 11.728 0.000
L1.month -0.073483 0.141372 -0.520 0.603
L1.summer -1.901257 1.098533 -1.731 0.084
L2.AvgT 0.046366 0.021115 2.196 0.028
L2.AvgWind -0.071137 0.023647 -3.008 0.003
L2.month 0.052594 0.198109 0.265 0.791
L2.summer 2.197419 1.542386 1.425 0.154
L3.AvgT -0.037697 0.015810 -2.384 0.017
L3.AvgWind 0.030912 0.022991 1.345 0.179
L3.month -0.029351 0.140686 -0.209 0.835
L3.summer -1.930401 1.105112 -1.747 0.081
=============================================================================
Results for equation month
=============================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------
const -0.070216 0.090612 -0.775 0.438
L1.AvgT 0.003451 0.002575 1.340 0.180
L1.AvgWind 0.000377 0.003592 0.105 0.916
L1.month 0.981712 0.022497 43.638 0.000
L1.summer -0.061008 0.174812 -0.349 0.727
L2.AvgT -0.003119 0.003360 -0.928 0.353
L2.AvgWind 0.000316 0.003763 0.084 0.933
L2.month -0.000436 0.031526 -0.014 0.989
L2.summer -0.014117 0.245444 -0.058 0.954
L3.AvgT 0.005199 0.002516 2.066 0.039
L3.AvgWind -0.002152 0.003659 -0.588 0.556
L3.month -0.004544 0.022388 -0.203 0.839
L3.summer -0.011203 0.175859 -0.064 0.949
=============================================================================
Results for equation summer
=============================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------
const -0.014721 0.011669 -1.262 0.207
L1.AvgT 0.000834 0.000332 2.515 0.012
L1.AvgWind 0.000319 0.000463 0.690 0.490
L1.month -0.001100 0.002897 -0.380 0.704
L1.summer 0.984267 0.022512 43.721 0.000
L2.AvgT -0.000175 0.000433 -0.404 0.686
L2.AvgWind -0.000171 0.000485 -0.354 0.724
L2.month -0.000103 0.004060 -0.025 0.980
L2.summer -0.000233 0.031608 -0.007 0.994
L3.AvgT 0.000089 0.000324 0.275 0.783
L3.AvgWind 0.000101 0.000471 0.214 0.830
L3.month -0.000548 0.002883 -0.190 0.849
L3.summer -0.014278 0.022647 -0.630 0.528
=============================================================================
Correlation matrix of residuals
AvgT AvgWind month summer
AvgT 1.000000 -0.093361 0.025780 0.019558
AvgWind -0.093361 1.000000 -0.008817 0.015449
month 0.025780 -0.008817 1.000000 0.003461
summer 0.019558 0.015449 0.003461 1.000000
prediction = results.forecast(y=train[["AvgT","AvgWind","month","summer"]].values[-3:], steps=30)
resultsDF= pd.DataFrame(data=prediction, columns=[["AvgT","AvgWind","month","summer"]]) #Storing predictions in dataframe
resultsDF.set_index(test[:30].index,inplace=True)
resultsDF["AvgT"].plot()
train.AvgT[1900:].plot()
test.AvgT[:30].plot()
plt.show()
model_performance(test.AvgT[:30].values,resultsDF["AvgT"].values)
RMSE : 6.39 Normalized RMSE : 0.39
resultsDF["AvgWind"].plot()
train.AvgWind[1900:].plot()
test.AvgWind[:30].plot()
plt.show()
model_performance(test.AvgWind[:30].values,resultsDF["AvgWind"].values)
RMSE : 3.62 Normalized RMSE : 0.25
Importing ontario province data with multiple spatial stations
df0 = pd.read_excel("/content/Ontario daily.xlsx")
df0.head()
| station | day | max_temp_f | min_temp_f | avg_wind_speed_kts | avg_wind_drct | max_wind_gust_kts | |
|---|---|---|---|---|---|---|---|
| 0 | CWQP | 2017-01-01 | 39 | 27 | 12.652174 | 268.6966 | 41 |
| 1 | CWBE | 2017-01-01 | 35.2 | 16.2 | 7.73913 | 258.47382 | 25 |
| 2 | CWMZ | 2017-01-01 | 36.3 | 29.7 | 22.086956 | 259.42496 | 42 |
| 3 | CYTZ | 2017-01-01 | 37.4 | 32 | 11.956522 | 245.61368 | 28 |
| 4 | CWPS | 2017-01-01 | 36.7 | 30.2 | 8.782609 | 219.92087 | 30 |
df2 = df0.copy()
Creating a list of station names
stationsUsed = ["CXTO", "CYTZ", "CYYZ", "CYKZ", "CYOO", "CYKF", "CXHM","CWWB", "CYHM", "CXVN","CWWZ", "CYSN", "CZEL"]
df2 = df2[df0.station.isin(stationsUsed)]
df2.head()
| station | day | max_temp_f | min_temp_f | avg_wind_speed_kts | avg_wind_drct | max_wind_gust_kts | |
|---|---|---|---|---|---|---|---|
| 3 | CYTZ | 2017-01-01 | 37.4 | 32 | 11.956522 | 245.61368 | 28 |
| 11 | CWWB | 2017-01-01 | 37.9 | 31.5 | 5.826087 | 230.53685 | 21 |
| 13 | CYHM | 2017-01-01 | 35.6 | 24.8 | 8.652174 | 240.98889 | 21 |
| 21 | CWWZ | 2017-01-01 | 40.5 | 30.9 | 7.043478 | 249.58049 | 35 |
| 29 | CYSN | 2017-01-01 | 39.2 | 28.4 | 9.142858 | 228.71013 | 17 |
Checking for Null values
def checkforNulls(df):
for i in df.columns:
print("\nColumn name : {}".format(i))
count = 0
for j in df[i]:
if j == "None":
count = count+1
print("Null count : ",count)
checkforNulls(df2)
Column name : station Null count : 0 Column name : day Null count : 0 Column name : max_temp_f Null count : 1914 Column name : min_temp_f Null count : 1914 Column name : avg_wind_speed_kts Null count : 1620 Column name : avg_wind_drct Null count : 1620 Column name : max_wind_gust_kts Null count : 10780
df2.drop(['max_wind_gust_kts'],axis=1, inplace=True)
checkforNulls(df2)
Column name : station Null count : 0 Column name : day Null count : 0 Column name : max_temp_f Null count : 1914 Column name : min_temp_f Null count : 1914 Column name : avg_wind_speed_kts Null count : 1620 Column name : avg_wind_drct Null count : 1620
df2 = df2.replace('None', np.nan)
Using Forward fill for null values
df2.ffill(inplace=True)
df2.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 28496 entries, 3 to 197264 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 station 28496 non-null object 1 day 28496 non-null datetime64[ns] 2 max_temp_f 28496 non-null float64 3 min_temp_f 28496 non-null float64 4 avg_wind_speed_kts 28496 non-null float64 5 avg_wind_drct 28496 non-null float64 dtypes: datetime64[ns](1), float64(4), object(1) memory usage: 1.5+ MB
df2.columns = ["station","day","MaxT","MinT","AvgWind","WindDir"]
df2.head()
| station | day | MaxT | MinT | AvgWind | WindDir | |
|---|---|---|---|---|---|---|
| 3 | CYTZ | 2017-01-01 | 37.4 | 32.0 | 11.956522 | 245.61368 |
| 11 | CWWB | 2017-01-01 | 37.9 | 31.5 | 5.826087 | 230.53685 |
| 13 | CYHM | 2017-01-01 | 35.6 | 24.8 | 8.652174 | 240.98889 |
| 21 | CWWZ | 2017-01-01 | 40.5 | 30.9 | 7.043478 | 249.58049 |
| 29 | CYSN | 2017-01-01 | 39.2 | 28.4 | 9.142858 | 228.71013 |
Adding Average temperature column
def avg_temp(day):
return (day.MaxT+day.MinT)/2
df2['AvgT'] = df2.apply(avg_temp,axis=1)
df2.head()
| station | day | MaxT | MinT | AvgWind | WindDir | AvgT | |
|---|---|---|---|---|---|---|---|
| 3 | CYTZ | 2017-01-01 | 37.4 | 32.0 | 11.956522 | 245.61368 | 34.7 |
| 11 | CWWB | 2017-01-01 | 37.9 | 31.5 | 5.826087 | 230.53685 | 34.7 |
| 13 | CYHM | 2017-01-01 | 35.6 | 24.8 | 8.652174 | 240.98889 | 30.2 |
| 21 | CWWZ | 2017-01-01 | 40.5 | 30.9 | 7.043478 | 249.58049 | 35.7 |
| 29 | CYSN | 2017-01-01 | 39.2 | 28.4 | 9.142858 | 228.71013 | 33.8 |
df_avgT = df2.groupby('day').agg({'AvgT':list})
df_avgT.head()
| AvgT | |
|---|---|
| day | |
| 2017-01-01 | [34.7, 34.7, 30.200000000000003, 35.7, 33.8, 2... |
| 2017-01-02 | [27.5, 32.9, 30.2, 30.200000000000003, 33.8, 3... |
| 2017-01-03 | [34.55, 37.400000000000006, 40.1, 39.0, 37.400... |
| 2017-01-04 | [28.299999999999997, 29.3, 28.4, 29.2, 29.3, 2... |
| 2017-01-05 | [10.3, 18.5, 17.700000000000003, 17.8, 19.4, 1... |
df_avgT_F = pd.DataFrame(df_avgT['AvgT'].tolist(),index=df_avgT.index).add_prefix("s")
df_avgT_F.head()
| s0 | s1 | s2 | s3 | s4 | s5 | s6 | s7 | s8 | s9 | s10 | s11 | s12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| day | |||||||||||||
| 2017-01-01 | 34.70 | 34.7 | 30.2 | 35.7 | 33.8 | 28.40 | 21.55 | 33.35 | 33.80 | 34.25 | 32.00 | 31.1 | 22.10 |
| 2017-01-02 | 27.50 | 32.9 | 30.2 | 30.2 | 33.8 | 32.00 | 35.55 | 36.50 | 35.25 | 33.65 | 37.15 | 32.9 | 26.95 |
| 2017-01-03 | 34.55 | 37.4 | 40.1 | 39.0 | 37.4 | 35.60 | 35.60 | 36.50 | 41.00 | 37.85 | 39.05 | 38.5 | 38.75 |
| 2017-01-04 | 28.30 | 29.3 | 28.4 | 29.2 | 29.3 | 26.60 | 23.55 | 28.40 | 24.80 | 25.70 | 29.90 | 28.4 | 26.60 |
| 2017-01-05 | 10.30 | 18.5 | 17.7 | 17.8 | 19.4 | 19.15 | 14.90 | 18.50 | 17.60 | 15.80 | 13.10 | 15.8 | 21.40 |
df_AvgWind = df2.groupby('day').agg({'AvgWind':list})
df_AvgWind_F = pd.DataFrame(df_AvgWind['AvgWind'].tolist(),index=df_AvgWind.index).add_prefix("s")
df_AvgWind_F.head()
| s0 | s1 | s2 | s3 | s4 | s5 | s6 | s7 | s8 | s9 | s10 | s11 | s12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| day | |||||||||||||
| 2017-01-01 | 11.956522 | 5.826087 | 8.652174 | 7.043478 | 9.142858 | 7.152899 | 6.652174 | 4.173913 | 7.521739 | 0.000000 | 9.434783 | 6.215942 | 7.656450 |
| 2017-01-02 | 7.354210 | 7.119318 | 6.972464 | 9.639131 | 5.761905 | 8.960145 | 8.826087 | 11.342960 | 7.260870 | 3.869565 | 0.000000 | 2.521739 | 8.652174 |
| 2017-01-03 | 5.956522 | 6.471014 | 8.441240 | 6.521739 | 6.823188 | 5.476720 | 6.645025 | 8.400993 | 4.891270 | 6.565218 | 3.217391 | 0.000000 | 2.869565 |
| 2017-01-04 | 9.869565 | 15.086957 | 0.000000 | 12.043478 | 20.355797 | 19.548552 | 18.956522 | 19.197826 | 20.135078 | 11.676087 | 17.869566 | 16.746826 | 13.411594 |
| 2017-01-05 | 14.608696 | 0.000000 | 7.478261 | 11.869565 | 18.696377 | 9.695652 | 13.296377 | 11.369842 | 12.605797 | 11.336232 | 14.329710 | 14.718841 | 14.782609 |
df_avgT_F.to_csv('df_avgT_F', index=False, header=False)
df_AvgWind_F.to_csv('df_AvgWind_F', index=False, header=False)
Here we will use spatial data of 13 stations and forecast on the station no. 1.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import LSTM, Dense
data = pd.read_csv('/content/df_avgT_F', header=None)
data.shape
(2192, 13)
Creating model for forecasting temperature using spatial and temporal data
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
from tensorflow.keras.callbacks import EarlyStopping
# Loading the data
data = pd.read_csv("/content/df_avgT_F", header=None)
# Setting the first column as the target variable
y = data.iloc[:, 0].values
# Set the remaining columns as the input features
X = data.iloc[:, 1:].values
# Spliting the data into training and testing sets, data has over 2000 rows and we want to forecast for next 30 days
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.02 , shuffle=False)
# Normalizing the input features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Reshaping the data to be 3-dimensional for input into the LSTM layer
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
# Define the model architecture
model = Sequential()
model.add(LSTM(units=128, input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(Dense(units=1))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Define early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=5, mode='min', restore_best_weights=True)
# Train the model, using 50 epochs to train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, callbacks=[early_stopping])
# Evaluate the model on the test set
mse = model.evaluate(X_test, y_test)
print("Mean squared error on test set:", mse)
Epoch 1/50 54/54 [==============================] - 8s 20ms/step - loss: 1879.8627 - val_loss: 1274.8625 Epoch 2/50 54/54 [==============================] - 0s 7ms/step - loss: 1069.4657 - val_loss: 894.7989 Epoch 3/50 54/54 [==============================] - 0s 7ms/step - loss: 775.6580 - val_loss: 659.4827 Epoch 4/50 54/54 [==============================] - 0s 7ms/step - loss: 591.2448 - val_loss: 510.2649 Epoch 5/50 54/54 [==============================] - 0s 5ms/step - loss: 480.8998 - val_loss: 423.7098 Epoch 6/50 54/54 [==============================] - 0s 5ms/step - loss: 413.6322 - val_loss: 373.1080 Epoch 7/50 54/54 [==============================] - 0s 6ms/step - loss: 381.5760 - val_loss: 345.8346 Epoch 8/50 54/54 [==============================] - 0s 5ms/step - loss: 359.6994 - val_loss: 331.4642 Epoch 9/50 54/54 [==============================] - 0s 5ms/step - loss: 351.0652 - val_loss: 325.1908 Epoch 10/50 54/54 [==============================] - 0s 5ms/step - loss: 350.5692 - val_loss: 322.4828 Epoch 11/50 54/54 [==============================] - 0s 5ms/step - loss: 350.5721 - val_loss: 320.9760 Epoch 12/50 54/54 [==============================] - 0s 6ms/step - loss: 349.9404 - val_loss: 320.4550 Epoch 13/50 54/54 [==============================] - 0s 5ms/step - loss: 349.3598 - val_loss: 320.9386 Epoch 14/50 54/54 [==============================] - 0s 6ms/step - loss: 345.8974 - val_loss: 318.7810 Epoch 15/50 54/54 [==============================] - 0s 5ms/step - loss: 346.6258 - val_loss: 313.5230 Epoch 16/50 54/54 [==============================] - 0s 6ms/step - loss: 266.3181 - val_loss: 135.4157 Epoch 17/50 54/54 [==============================] - 0s 5ms/step - loss: 129.4768 - val_loss: 88.5828 Epoch 18/50 54/54 [==============================] - 0s 6ms/step - loss: 87.0664 - val_loss: 59.6681 Epoch 19/50 54/54 [==============================] - 0s 5ms/step - loss: 64.4789 - val_loss: 48.6291 Epoch 20/50 54/54 [==============================] - 0s 5ms/step - loss: 50.3608 - val_loss: 33.1021 Epoch 21/50 54/54 [==============================] - 0s 5ms/step - loss: 40.9943 - val_loss: 28.3095 Epoch 22/50 54/54 [==============================] - 0s 5ms/step - loss: 35.3465 - val_loss: 25.4852 Epoch 23/50 54/54 [==============================] - 0s 5ms/step - loss: 33.6625 - val_loss: 24.9399 Epoch 24/50 54/54 [==============================] - 0s 5ms/step - loss: 26.5554 - val_loss: 22.4235 Epoch 25/50 54/54 [==============================] - 0s 5ms/step - loss: 28.8024 - val_loss: 20.3882 Epoch 26/50 54/54 [==============================] - 0s 5ms/step - loss: 27.7227 - val_loss: 18.1281 Epoch 27/50 54/54 [==============================] - 0s 6ms/step - loss: 26.2403 - val_loss: 18.8007 Epoch 28/50 54/54 [==============================] - 0s 5ms/step - loss: 22.7410 - val_loss: 17.7217 Epoch 29/50 54/54 [==============================] - 0s 6ms/step - loss: 24.2670 - val_loss: 17.3487 Epoch 30/50 54/54 [==============================] - 0s 5ms/step - loss: 23.7632 - val_loss: 16.3371 Epoch 31/50 54/54 [==============================] - 0s 5ms/step - loss: 22.8658 - val_loss: 19.2071 Epoch 32/50 54/54 [==============================] - 0s 6ms/step - loss: 24.8748 - val_loss: 16.8178 Epoch 33/50 54/54 [==============================] - 0s 6ms/step - loss: 22.7190 - val_loss: 17.3709 Epoch 34/50 54/54 [==============================] - 0s 6ms/step - loss: 22.7727 - val_loss: 16.3906 Epoch 35/50 54/54 [==============================] - 0s 6ms/step - loss: 22.4584 - val_loss: 15.9492 Epoch 36/50 54/54 [==============================] - 0s 5ms/step - loss: 20.8170 - val_loss: 16.7067 Epoch 37/50 54/54 [==============================] - 0s 5ms/step - loss: 21.4646 - val_loss: 16.3973 Epoch 38/50 54/54 [==============================] - 0s 5ms/step - loss: 22.4543 - val_loss: 18.9121 Epoch 39/50 54/54 [==============================] - 0s 7ms/step - loss: 21.2218 - val_loss: 16.0473 Epoch 40/50 54/54 [==============================] - 0s 7ms/step - loss: 21.4750 - val_loss: 19.1370 2/2 [==============================] - 0s 8ms/step - loss: 9.9779 Mean squared error on test set: 9.97788143157959
y_pred = model.predict(X_test)
2/2 [==============================] - 1s 6ms/step
Evaluating the model performance
print(model_performance(y_test, y_pred))
RMSE : 10.97 Normalized RMSE : 0.25 None
plt.figure(figsize=(16,4))
plt.plot(y_test[:30], label="Actual")
plt.plot(y_pred[:30], label="Predicted")
plt.title('Average Temprature forecast', fontsize=20)
plt.ylabel('Sales', fontsize=16)
plt.legend()
<matplotlib.legend.Legend at 0x7ff49eff5f10>
Creating model for forecasting Wind using spatial and temporal data
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
from tensorflow.keras.callbacks import EarlyStopping
# Loading the data
data = pd.read_csv("/content/df_AvgWind_F", header=None)
# Setting the first column as the target variable
y = data.iloc[:, 0].values
# Set the remaining columns as the input features
X = data.iloc[:, 1:].values
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.02, shuffle=False)
# Normalizing the input features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Reshape the data to be 3-dimensional for input into the LSTM layer
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
# Defining the model architecture
model = Sequential()
model.add(LSTM(units=128, input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(Dense(units=1))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Define early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=5, mode='min', restore_best_weights=True)
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, callbacks=[early_stopping])
# Evaluate the model on the test set
mse = model.evaluate(X_test, y_test)
print("Mean squared error on test set:", mse)
Epoch 1/50 54/54 [==============================] - 3s 12ms/step - loss: 32.4991 - val_loss: 15.7069 Epoch 2/50 54/54 [==============================] - 0s 5ms/step - loss: 13.0806 - val_loss: 13.6851 Epoch 3/50 54/54 [==============================] - 0s 5ms/step - loss: 11.6468 - val_loss: 11.8677 Epoch 4/50 54/54 [==============================] - 0s 6ms/step - loss: 11.7842 - val_loss: 11.6668 Epoch 5/50 54/54 [==============================] - 0s 5ms/step - loss: 11.8850 - val_loss: 12.7420 Epoch 6/50 54/54 [==============================] - 0s 5ms/step - loss: 11.4676 - val_loss: 11.4959 Epoch 7/50 54/54 [==============================] - 0s 6ms/step - loss: 11.5510 - val_loss: 12.9562 Epoch 8/50 54/54 [==============================] - 0s 5ms/step - loss: 11.5710 - val_loss: 11.4652 Epoch 9/50 54/54 [==============================] - 0s 6ms/step - loss: 11.8429 - val_loss: 11.6388 Epoch 10/50 54/54 [==============================] - 0s 6ms/step - loss: 12.1681 - val_loss: 11.3854 Epoch 11/50 54/54 [==============================] - 0s 6ms/step - loss: 11.7333 - val_loss: 11.4118 Epoch 12/50 54/54 [==============================] - 0s 5ms/step - loss: 12.0776 - val_loss: 12.3763 Epoch 13/50 54/54 [==============================] - 0s 6ms/step - loss: 11.6505 - val_loss: 11.3898 Epoch 14/50 54/54 [==============================] - 0s 6ms/step - loss: 11.4378 - val_loss: 11.9136 Epoch 15/50 54/54 [==============================] - 0s 5ms/step - loss: 11.6320 - val_loss: 11.5410 2/2 [==============================] - 0s 8ms/step - loss: 17.7951 Mean squared error on test set: 17.79507827758789
y_pred = model.predict(X_test)
2/2 [==============================] - 0s 6ms/step
print(model_performance(y_test, y_pred))
RMSE : 5.98 Normalized RMSE : 0.26 None
plt.figure(figsize=(16,4))
plt.plot(y_test[:30], label="Actual")
plt.plot(y_pred[:30], label="Predicted")
plt.title('Average Wind forecast', fontsize=20)
plt.ylabel('Sales', fontsize=16)
plt.legend()
<matplotlib.legend.Legend at 0x7ff49fbf3730>